• Tuesday, September 3, 2024

    Nvidia's new Blackwell chip demonstrated top per GPU performance in MLPerf's LLM Q&A benchmark, showcasing significant advancements with its 4-bit floating-point precision. However, competitors like Untether AI and AMD also showed promising results, particularly in energy efficiency. Untether AI's speedAI240 chip, for instance, excelled in the edge-closed category, highlighting diverse strengths across new AI inference hardware.

  • Wednesday, June 19, 2024

    Systems powered by Nvidia's Hopper architecture dominated the results of two new tests from MLPerf, an AI benchmarking suite that compares the fine-tuning of large language models and training of graph neural networks.

  • Monday, September 2, 2024

    Nvidia's Blackwell chips are about twice as big as its predecessors, housing 2.6 times the number of transistors. Instead of one big piece of silicon, Blackwell consists of two advanced processors and numerous memory components joined in a single, delicate mesh of silicon, metal, and plastic. The manufacturing of each chip has to be close to perfect, presenting engineering challenges that have a sizable impact on the bottom line, with each defect rendering a $40,000 chip useless. This article looks at some of the challenges Nvidia had to overcome to produce the chip.

  • Monday, June 3, 2024

    Nvidia has unveiled a new generation of artificial intelligence chip architecture called Rubin. The company only just announced its upcoming Blackwell model in March - those chips are still in production and expected to ship to customers later in 2024. Nvidia has pledged to release new AI chip models on a one-year rhythm. The less-than-three-month turnaround from Blackwell to Rubin underscores the competitive frenzy in the AI chip market.

  • Monday, July 29, 2024

    Rumors suggest NVIDIA may introduce a new TITAN AI graphics card based on the Blackwell GPU. Tech leakers hint at this top-tier card's existence, despite NVIDIA's previous decision to not release a Titan variant for the RTX 40 series. The release and actual utility of such a high-performance GPU, potentially 63% faster than the RTX 4090, remain uncertain. Market dominance by the RTX 4090 may make a new Titan superfluous.

  • Wednesday, September 18, 2024

    Nvidia's dominance in AI chips has propelled it to immense market value, largely thanks to its GPU capabilities and CUDA software ecosystem. However, competitors like AMD, Intel, Cerebras, and SambaNova are developing innovative solutions to challenge Nvidia's supremacy in AI hardware. While Nvidia's lead remains secure for now, the landscape is dynamic, with multiple players striving to carve out their own niches in the AI market.

  • Monday, June 3, 2024

    Nvidia is reportedly preparing a system-on-chip that pairs Arm's Cortex-X5 core design with GPUs based on Nvidia's Blackwell architecture.

  • Thursday, August 15, 2024

    Nvidia has released its Llama 3.1 minitron 4B model. The model scored 16% better on MMLU compared with training from scratch by using knowledge distillation and pruning and required 40x fewer tokens.

  • Monday, August 5, 2024

    Nvidia's Blackwell B200 chips will take at least three months longer to produce than was planned. The delay is due to a design flaw that was discovered unusually late in the production process. Nvidia is now working through a fresh set of test runs and won't ship large numbers of the chips until the first quarter. Microsoft, Google, and Meta have already ordered tens of billions of dollars worth of the chips.

  • Friday, April 19, 2024

    NVIDIA's dominance in the AI space continues to be secured not just by hardware, but by its CUDA software ecosystem and proprietary interconnects. Alternatives like AMD's ROCM struggle to match CUDA's ease of use and performance optimization, ensuring NVIDIA's GPUs remain the preferred choice for AI workloads. Investments in the CUDA ecosystem and community education solidify NVIDIA's stronghold in AI compute.

  • Wednesday, April 10, 2024

    Intel has announced its new Gaudi 3 AI processors, claiming up to 1.7X the training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors at a lower cost.

  • Wednesday, April 10, 2024

    Intel has announced its new Gaudi 3 AI processors, claiming up to 1.7X the training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors at a lower cost.

  • Wednesday, October 2, 2024

    NVIDIA has introduced NVLM 1.0, a series of advanced multimodal large language models (LLMs) that excel in vision-language tasks, competing with both proprietary models like GPT-4o and open-access models such as Llama 3-V 405B and InternVL 2. The NVLM-D-72B model, which is part of this release, is a decoder-only architecture that has been open-sourced for community use. Notably, NVLM 1.0 demonstrates enhanced performance in text-only tasks compared to its underlying LLM framework after undergoing multimodal training. The model has been trained using the Megatron-LM framework, with adaptations made for hosting and inference on Hugging Face. This adaptation allows for reproducibility and comparison with other models. Benchmark results indicate that NVLM-D 1.0 72B achieves impressive scores across various vision-language benchmarks, such as MMMU, MathVista, and VQAv2, showing competitive performance against other leading models. In addition to multimodal benchmarks, NVLM-D 1.0 also performs well in text-only benchmarks, showcasing its versatility. The model's architecture allows for efficient loading and inference, including support for multi-GPU setups. Instructions for preparing the environment, loading the model, and performing inference are provided, ensuring that users can effectively utilize the model for their applications. The model's inference capabilities include both text-based conversations and image-based interactions. Users can engage in pure-text dialogues or ask the model to describe images, demonstrating its multimodal capabilities. The documentation includes detailed code snippets for loading images, preprocessing them, and interacting with the model. The NVLM project is a collaborative effort, with contributions from multiple researchers at NVIDIA. The model is licensed under the Creative Commons BY-NC 4.0 license, allowing for non-commercial use. The introduction of NVLM 1.0 marks a significant advancement in the field of multimodal AI, providing powerful tools for developers and researchers alike.

  • Thursday, April 11, 2024

    Meta has announced the next generation of its AI accelerator chip. Its development focused on chip memory (128GB at 5nm) and throughput (11 TFLOPs at int8).

  • Tuesday, June 4, 2024

    AMD unveiled its latest AI processors, including the MI325X accelerator due in Q4 2024, at the Computex trade show. It also detailed plans to compete with Nvidia by releasing new AI chips annually. The MI350 series, expected in 2025, promises a 35-fold performance increase in inference compared to the MI300 series. The MI400 series is set for a 2026 release.

  • Thursday, July 4, 2024

    Nvidia's CEO Jensen Huang attributes the company's AI chip market dominance, maintaining an over 80% market share despite rising competition, to a decade-old strategic investment. Advocating for Nvidia's AI chips' cost-effectiveness and performance, Huang highlights the firm's transformation into a data center-focused entity and expansion into new markets.

  • Wednesday, July 17, 2024

    Vultr offers a full NVIDIA GPU stack with global access to the latest technology. With 32 cloud data center locations across 6 continents, their cloud infrastructure ensures global reach, enabling enterprises to power AI and ML at the edge efficiently. The state-of-the-art lineup of NVIDIA GPUs for AI/ML, AR/VR, high-performance computing, VDI/CAD, and more includes: NVIDIA GH200 Grace Hopperâ„¢ Superchip, NVIDIA H100 & H200 Tensor Core GPUs, NVIDIA A100 Tensor Core GPU, NVIDIA L40S GPU, NVIDIA A40 GPU, NVIDIA A16 GPU. Learn more about accelerating your organization's AI initiatives with affordable access to GPUs and begin exploring Vultr with a $250 credit.

  • Monday, July 22, 2024

    Nvidia is developing a new AI chip, the B20, tailored to comply with U.S. export controls for the Chinese market, leveraging its partnership with distributor Inspur. Its advanced H20 chip has reportedly seen a rapid growth in sales in China, with projections of selling over 1 million units worth $12 billion this year. U.S. pressure on semiconductor exports continues, with possible further restrictions and control measures on AI model development.

  • Friday, September 27, 2024

    The Vultr Cloud Alliance has formed a significant partnership with AMD to enhance high-performance artificial intelligence (AI) and high-performance computing (HPC) capabilities. This collaboration integrates AMD's advanced Instinctâ„¢ MI300X GPU accelerators with Vultr's expansive global cloud infrastructure, creating a powerful solution tailored for enterprises across various industries. AMD is recognized as a leader in high-performance computing, providing the MI300X GPUs and the ROCmâ„¢ open software ecosystem. The MI300X GPU is designed for high processing power and substantial memory capacity, making it particularly effective for complex AI models and demanding HPC workloads. The ROCmâ„¢ software ecosystem supports major AI frameworks like PyTorch and TensorFlow, facilitating flexibility and rapid development for users. The integration of AMD's technology with Vultr's infrastructure allows businesses to accelerate performance, streamline operations, and reduce costs. This partnership emphasizes a composable and flexible approach to cloud solutions, enabling enterprises of all sizes to access high-performance computing and AI capabilities without the constraints of vendor lock-in. This accessibility is crucial for democratizing AI and inference, allowing even smaller enterprises to utilize advanced technologies that were previously unattainable. The collaboration also addresses the needs of various industries, including healthcare, financial services, manufacturing, energy, media, retail, and telecommunications. By combining AMD's powerful GPUs and ROCmâ„¢ software with Vultr's scalable cloud services, businesses can tackle common challenges such as computational power, data management, and regulatory compliance. Customized solutions are provided to enhance performance and efficiency, tailored to the specific requirements of different sectors. With AMD's involvement in the Vultr Cloud Alliance Program, enterprises can leverage a unique combination of high-performance GPUs, open software, and flexible cloud infrastructure. This partnership aims to drive innovation, reduce costs, and make advanced AI and HPC solutions accessible to a broader range of businesses. Organizations are encouraged to explore the potential of this collaboration and consider how it can shape the future of cloud computing. For those interested in getting started, further information is available on the Vultr website, or potential users can reach out to the sales team for assistance.

  • Monday, April 15, 2024

    Google's new AI chip, Cloud TPU v5p, is now available. It boasts nearly triple the training speed for large language models compared to its predecessor, TPU v4. This release underscores Google's position in the AI hardware race alongside competitors like Nvidia. Google has also introduced the Google Axion CPU, based on Arm's chip infrastructure, promising better performance and energy efficiency.

  • Tuesday, May 14, 2024

    These authors, from Stanford University, focused on optimizing AI's compute usage and developed ThunderKittens, a DSL embedded in CUDA, to write efficient kernels. ThunderKittens simplifies the process of utilizing hardware features like the Tensor Memory Accelerator (TMA) and warp group matrix multiply accumulate (WGMMA) instructions, leading to significant performance improvements in Flash Attention and Based linear attention kernels.

  • Monday, June 17, 2024

    NVIDIA's Nemotron-4 340B is a family of open models that developers can use to generate synthetic data for training LLMs for commercial applications. The state-of-the-art reward model matches the original GPT-4 model and can run on 8 H100s.

  • Tuesday, May 21, 2024

    Microsoft recently spent an entire day pitting its new hardware against the MacBook Air. Its new Surface devices, equipped with Qualcomm's Snapdragon X Elite chips, pulled ahead in tests against Apple's category-leading laptop. Microsoft's Copilot Plus PCs feature an improved emulator that can emulate apps twice as fast as the previous generation of Windows on Arm devices. They are equipped with a neural processing unit that can perform more AI task operations per watt than the MacBook Air M3 and Nvidia's RTX 4060. Microsoft's Copilot Plus PCs will hit the market this summer.

  • Thursday, August 8, 2024

    Nvidia is facing increased government scrutiny from the EU, UK, China, and the US Justice Department over its dominant market share in AI chips and sales practices. The company is rapidly building its legal and policy teams to address antitrust concerns amid profitable growth, as it commands 90 percent of the GPU market essential for AI systems. Nvidia is also adapting to increased competition oversight, with recent attention turning to its planned acquisition of Run.ai and impact on the AI supply chain.

  • Monday, September 9, 2024

    Intel has unveiled its Core Ultra 200V lineup, previously known as Lunar Lake, boasting superior AI performance, fast CPUs, and competitive integrated GPUs for thin laptops. The processors feature eight CPU cores, integrated memory, and enhanced efficiency but are limited to 32GB RAM. Major manufacturers like Acer, Asus, Dell, and HP will launch laptops with these new chips. Reviews are pending to confirm Intel's claims.

  • Thursday, September 19, 2024

    An impressive array of open models that approach the frontier of performance. Specifically, they have strong performance on code, math, structured output, and reasoning. The Qwen team has also released a suite of sizes for a variety of use cases.

    Hi Impact
  • Wednesday, June 26, 2024

    Researchers claim to have developed a method of running AI models more efficiently that involves eliminating matrix multiplication from the process. A fundamental redesign of the neural network operations that are currently accelerated by GPU chips, the method could have deep implications for the environmental impact and operational costs of AI systems. It challenges the prevailing paradigm that matrix multiplication operations are indispensable for building high-performing language models. The approach may outperform traditional large language models at very large scales, but this has not been tested due to computational constraints.

  • Monday, September 30, 2024

    AlphaChip has significantly transformed the landscape of computer chip design through the application of advanced AI techniques. Initially introduced in a preprint in 2020, AlphaChip employs a novel reinforcement learning method to optimize chip layouts, which has since been published in Nature and made available as open-source software. This innovative approach has enabled the creation of superhuman chip layouts that are now integral to hardware utilized globally. The development of AlphaChip was motivated by the need to enhance the efficiency of chip design, a process that has historically been labor-intensive and time-consuming. Traditional methods could take weeks or months to produce a chip layout, whereas AlphaChip can generate comparable or superior designs in just hours. This acceleration is particularly evident in the design of Google’s Tensor Processing Units (TPUs), which are crucial for scaling AI models based on Google's Transformer architecture. AlphaChip operates by treating chip floorplanning as a game, akin to how AlphaGo and AlphaZero approached their respective games. It begins with a blank grid and strategically places circuit components, receiving rewards based on the quality of the final layout. A unique edge-based graph neural network allows AlphaChip to learn the intricate relationships between interconnected components, improving its performance with each design iteration. The impact of AlphaChip extends beyond Google’s internal projects; it has influenced the broader chip design industry. Companies like MediaTek have adopted and adapted AlphaChip to enhance their own chip development processes, leading to improvements in power efficiency and performance. The technology has sparked a wave of research into AI applications for various stages of chip design, including logic synthesis and macro selection. Looking ahead, the potential of AlphaChip is vast. It is expected to optimize every phase of the chip design cycle, from architecture to manufacturing, thereby revolutionizing the creation of custom hardware found in everyday devices. Future iterations of AlphaChip are in development, with the aim of producing chips that are faster, cheaper, and more power-efficient, ultimately benefiting a wide range of applications from smartphones to medical devices. The collaborative efforts of a diverse team of researchers have been instrumental in the success of AlphaChip, highlighting the importance of interdisciplinary work in advancing technology. As the field of AI-driven chip design continues to evolve, AlphaChip stands at the forefront, promising to reshape the future of computing.

  • Tuesday, March 26, 2024

    Google designed the TPU v1 for fast, cost-effective inference using trained neural network models at scale. Its key feature is a focus on tensor operations, specifically matrix multiplications, which are core to neural network computations. The TPU v1 is 15-30x faster than contemporary CPUs/GPUs for inference. It has 25-29x better performance per watt than GPUs.

  • Thursday, April 25, 2024

    Microsoft has released a set of GPU accelerated kernels for training BitNet style models. These models have substantially lower memory cost without much drop in accuracy.